perm filename CACHE.2[AM,DBL] blob
sn#422947 filedate 1979-03-06 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00010 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \input paper.tex
C00005 00003 \NSECP{INTRODUCTION}
C00009 00004 \NSECP{THE ASSUMPTIONS}
C00021 00005 \NSECP{THE PROBLEM}
C00026 00006 \NSECP{ABSTRACTION}
C00027 00007 \NSECP{CACHING}
C00031 00008 \NSECP{EXPECTATION-SIMPLIFIED PROCESSING}
C00032 00009 \NSECP{COGNITIVE ECONOMY REVISITED}
C00034 00010 \NNSECP{Acknowledgements}
C00052 ENDMK
C⊗;
\input paper.tex
\TITL{COGNITIVE ECONOMY}
\vfill
\AUTHO{Douglas B. Lenat, \ Stanford University}
\AUTHO{Frederick Hayes-Roth, \ Rand Corporation}
\AUTHO{Phillip Klahr, \ Rand Corporation}
\vfill
\vskip 10pt plus 2pt
\NNSECP{ABSTRACT}
Intelligent systems can explore only tiny subsets of their potential
external and conceptual worlds. They must develop efficient forms
of representation, access, and operation to increase their
capacities. Some of these forms involve abstraction, caching, and
expectation- simplified processing. These capabilities in turn can
combine to provide extremely powerful increases in performance. For
example, all three can combine to simplify simulation or, one of its
related functions, detection of surprising events. Our analysis of
the economic principles of modern AI systems or (presumably more
sophisticated) human intelligence suggests that previous ideas
regarding cognitive efficiency have erred in fundamental ways. For
example, the nonredundant storage of properties in hierarchical
inheritance nets increases many processing costs while providing
minimal storage cost savings. We propose methods to exploit the
potential advantages of both schemes.
\vfill
\eject
\NSECP{INTRODUCTION}
Every scientific theory is constructed in a rich context of surrounding
theories, methods, and standards which determine which experiments are
reasonable ones to perform and which point of view to take when interpreting
the results -- in short, a {/it paradigm}. We feel it useful
to articulate the "hard core" of our paradigm (the assumptions our theory
rests upon) before presenting the problem (Section 3) and principal
ideas for solution (Sections 4-6). Among our assumptions are:
a model for intelligence
(Section 2.1), a model for how intelligent computer programs may be
organized (Section 2.2), and a model of the characteristics of present
computing engines (Section 2.3).
In highly condensed form, our argument proceeds as follows:
(i) We continually face searches in more or less immense spaces; intelligence is
the ability to bring {/it appropriate} knowledge to bear, to speed up such searching.
Increasing intelligence then comprises increasing knowledge, its
organization, and the conditions for its applicability.
How are these new discoveries made? Empirical induction
(generalizing from experimental and other observations), analogy, and the criticism
of existing theory all lead to new conjectures, new possibilities.
(ii) Intelligent systems can make the applicability of their knowledge explicit
by representing that knowledge as condition/action rules (If {/sl situation}
then {/sl appropriate reaction}). Such pattern-directed inference systems (PDIS)
can benefit from a schema representation (frame, unit, being, script, etc.), because
this adds structure which the rules can then take advantage of.
(iii) Computers currently present us with limited cycles, cheap storage,
and expensive knowledge acquisition.
(iv) The problem: We want to develop initial systems quickly, and then have
them speed up and economize their computing, to maximize their potential.
(v) Many AI researchers {/sl cum} language designers have focused on the first
half, resulting in excellent software such as Interlisp's DWIM, File, and
Break Packages. We therefore call attention to the second half, to ways in
which programs can (semi-)automatically improve themselves.
Our principal suggestions for meeting this challenge are:
redundant representation of knowledge at
multiple levels of
abstraction, caching of computed results,
and expectation-simplified computing.
\NSECP{THE ASSUMPTIONS}
\SSEC{Our Model of Intelligence}
Many human cognitive activities, including most of those commonly referred
to as "requiring intelligence", can be cast as searches, as explorations through
a search space, meandering toward some goal. We call a problem-solver more
intelligent if he can efficiently work toward a solution even in the face of an
immense search space and an ill-defined goal.
Somehow, he is imposing more structure on the problem, and using that to
search more effectively. How might he do this?
According to our model of human intelligence, he accomplishes his task by
bringing knowledge to bear, knowledge not supplied explicitly in the problem
statement. This knowledge can be about problem-solving in
general (e.g., how to recognize and break cultural blocks)
or about the problem's domain specifically
(e.g., which groups of atoms can usually be treated as superatoms).
It may have been learned long ago, or generated during the
early phase of work on the problem.
This implies that a problem solver can become more effective by
(i) increasing his knowledge, (ii) better organizing his knowledge, and
(iii) refining his criteria for deciding when various pieces of knowledge
are applicable. In terms of production (If/Then) rules, these correspond to
adding new rules, modifying the data structure in which rules are held,
and modifying the conditions (IF parts) of existing rules.
How is new knowledge discovered? One route is that of {/it abstraction}: condensing
many experiences into a heuristic which, in hindsight, we see would have
greatly aided us in the past, which would have speeded up our reaching this
state in the search. Closely related is {/it generalization}, often merely
expanding the domain of relevance of a specific bit of knowledge we already
possessed. {/it Analogy} is one of the most useful techniques; one can draw
parallels not merely to other existing facts and heuristics, but also to the
{/sl paths} which led to their discovery (e.g., even if a result in organic
chemistry does not map over into the inorganic world, the experiment which
led you to that first fact may map over into an experiment which {/it will}
reveal a useful fact in inorganic chemistry; even though the analogue of a
mathematicl theorem may be false in some new domain, the analogous proof technique
may be useful). Another path to discovery is {/it specialization} of more
general knowledge. Finally, we must mention the process of {/it hypothesis,
criticism, and improved hypothesis}, which is a common and powerful
method of spiralling
in toward precision. In mathematics [Lakatos] and many
scientific disciplines [Musgrave & Lakatos], counterexamples
to current theories arise frequently, and their incorporation often demands a
deepening of the theory, an increase in knowledge.
Experiments such as the AM program [ref] support our assertion that
these methods can adequately
guide even open-ended searches for new definitions and conjectures.
But how can an intelligent system be programmed, how can such systems
be organized?
\SSEC{Our Model of Intelligent Program Organization}
The above remarks about intelligent problem solving apply equally to hardware
and wetware alike. Computer programs which are to be intelligent must
ultimately grapple with the tasks of knowledge acquisition, representation,
and refinement. We cannot provide an abolute answer to how they should be
organized, but we can posit some design constraints which have proven useful
so far.
A very general heuristic in AI programming is the following: If your program
is going to modify its own {\bf X}, then make {\bf X} as separable, clean, and explicit as
possible. In our case, {\bf X} can be instantiated as "knowledge", or as "applicability
conditions for each piece of knowledge". Thus the heuristic advises us to
represent our knowledge in a separate, clean, explicit form, say as knowledge
modules having some fixed structure, and also advises us to keep the applicability
conditions for each knowledge module separate from the rest of the knowledge it
contains.
This naturally leads us to a pattern-directed inference system, in
which knowledge is broken into separate modules, each with an explicit set of
relevancy tests. Such systems arising in Pittsburgh may overemphasize
cleanliness (so-called pure production systems), while those arising in
California may tend to be a bit too laid back (so-called ad hoc expert systems),
but such variations should not obscure their common source of power.
The dominant PDIS architecture has knowledge broken into a set of condition/action
production, rules of the form IF {/sl triggering situation} THEN {/sl appropriate
reaction}.
Having a clean representation for rules means having a clean, precise, elegant
language in which to express them. By structuring the conceptual knowledge
of the system, by partitioning each module's knowledge into several categories,
a rule can condense an entire cumbersome description into a single pointer.
The popular schematized representations of knowledge (scripts for episodes,
frames for static situations, beings for specialists, units for everything)
enable a rule like "If there are no known methods specific to finding new examples
of prime numbers, Then..."
to have its condition coded as a simple
null-test on the To-get subslot of the Examples slot of
the schema for Prime Numbers.
By a judicious choice of slot types, the system builder can reduce most
triggering conditions to such quick checks on the state of various (sub)slots
of schemata.
Additional knowledge is required to efficiently locate all the
rules which {/it might} have their conditions satisfied in a given situation,
and also to decide which rules to actually fire (obey) among those whose
IF parts have actually triggered (been satisfied).
The location of potentially-relevant rules is facilitated by organizing the
rules into some structure. For example, AM organized mathematical concepts
into a generalization/specialization hierarchy, and tied each rule to its most
relevant concept. To find rules potentially relevant to concept C, AM then
needed only to look at the rules tacked onto C, C's generalizations, their
generalizations, and so on. By inheritability, any rule relevant to
judging Objects in general was (potentially) relevant to judging
an Abelian group.
This requires an explicit focusing of attention, a selection of a "current
topic" C from which the above rippling proceeds. We favor the maintenance
of a separate data structure for this purpose, an agenda of topics to consider,
of subtasks to work on. Knowledge may be brought to bear to select a topic,
then the above quest for potentially relevant rules is carried out, then they
are examined to find the truly relevant satisfied set, and finally some subset
of those are fired off.
\SSEC{Our Model of (Present) Computers}
Since we are considering the problem of building computer models of
intelligent behavior, many of our assumptions deal with the characteristics
of present-day computers. They are the symbol manipulators we use,
but were not designed for that general purpose.
[RICK: FILL THIS SECTION OUT. BASIC IDEA WAS:
Computers currently present us with limited cycles, cheap storage,
uniprocessing, and expensive knowledge acquisition.]
\NSECP{THE PROBLEM}
When we build an AI program, we often find ourselves caught in the following
tradeoff: On the one hand, we want to develop our initial systems quickly
and painlessly, since they are experimental and ephemeral. On the other hand,
we want our systems to run as efficiently as possible, to minimize the
temporal delays, the magnitude of the cpu time required. Quick and easy
implementing is at odds with careful analyses leading to programs which are
optimized, which have maximized their potential.
For example, Standord's Ray Carhart spent much of 1978 visting Edinburgh,
rewriting Dendral (actually, its primary module, the CONGEN contstrained
generation program) efficiently for a minicomputer. A similar attempt may be
made soon for the Mycin program. Typical running times of cpu hours are not
extravagant in AI, even though a carefully optimized version might run in
5% of the time. This is because the researcher, forced to decide between
expressability and economy, chooses the former. It's better to accomplish
something inefficiently than to fail in an attempt which tried to save a few
hours of computation.
Many AI researchers {/sl cum} language designers have focused on the first
half,
the problem of rapidly constructing a working experimental vehicle.
They have produced some
excellent software such as Interlisp's DWIM, File, and
Break Packages[ref].
This paper is attempting to draw attention to the second half, to ways in
which programs can (semi-)automatically improve themselves.
The ideas exist which enable us to build in more "ultimate efficiency"
(e.g., the ability for a program to automatically increase its efficiency,
albeit at a nontrivial cost during the knowledge acquisition phase)
without risking the failure of the entire project.
Some earlier automatic programming systems [Darlington & Burstall, Lenat's PUP6,
others?] were designed to improve programs' efficiency, and many of those
ideas have found their way into the techniques we now describe.
Dave Barstow and Elaine Kant have developed a pair of programs which respectively
(i) explicate a choice for representation, control, etc., and (ii) choose,
using a model of the overall structure of the program, the user's intentions,
general analysis of algorithms techniques, and other knowledge.
Earlier, Jim Low built a system [ref] which chose
appropriate data structures based upon the flavor of accesses
that were frequently made to a body of information.
These AP systems
had program construction, transformation, and optimization as their primary
task; we are proposing ways to provide other kinds of AI programs
similar self-improving abilities.
Our principal suggestions for meeting this problem are:
(i) redundant representation of knowledge at
multiple levels of
abstraction, (ii) caching of computed results,
and (iii) using predictions to filter away expected, unsurprising data.
\NSECP{ABSTRACTION}
Desirability of being able to compute rough answers cheaply
Conceptual hierarchies
Heuristic Hierarchies
Interpretation and planning at levels of abstraction
Eg., rules of bomber simulation at difft levels
(this example will ultimately be used to suggest
caching for simplification)
Related research
\NSECP{CACHING}
Modifying memory to save computed results to speed subsequent accesses
Generalization of hardware concept
EURISKO types of caching, as first examples
Contrast with psychological conjectures of cognitive economy
(e.g., Collins&Quillian, KRL, ...). More like $HR↑2$ Plasticity
model of storing all retrieved paths as direct links
General principles
Updating Principles
-------------------
When
Why
How
get demon traps that flag the cache as out of date
the user requests updating if the cache seems staleness
Where
In what form
What
When not to
How to
Storage Principles
------------------
When
Every time you have to call a lower order function to eval. it
& it took quite a while.
You've caled it before, recently & the value didn't change.
Why
Cost of recomputing vs. cost of storage.
Context of subsequent cals is similar enough (e.g.l, the same
arguments will come up again.
How
Called functions might suggest how to cache their value in higher
calling caches (e.g., my value changes often so cache my defn.).
Cache should be transparent & discardable (should be able to throw
them all away if space needed).
Where
In what form
value ) what level of abstraction (partially evaluated
expression) symbolic expression)
Stack previous values to enable you to tell if they're changing.
What
You store a flag saying you've been here before.
When it was computed.
How much effort was expended on it, down to what levels of
algorithms, with what around caches incorporated.
Certainty of the result.
When not to
The value changes too frequently.
The function evaluates as fast as the caching mechanism itself
Space is too tight
How to eliminate caches
Space tight--> eliminate last used caches (last referenced)
\NSECP{EXPECTATION-SIMPLIFIED PROCESSING}
Central notion: reserve your computing for opportunities to realize
potential for expanding knowledge
You may decide how much to expend re-confirming the expected
Reductions realizable through expectations:
Perceptual set: see what you expect with less effort
Surprise: heighten sensitivity to data inconsistent with
expectations
Predict and prepare
What mechanisms are implicated?
Caching
PDMs (as triggers or demons)
Relevance to learning
Confirm or disconfirm predictors
This requires setting up PDMs to fire on dis/confirmation
\NSECP{COGNITIVE ECONOMY REVISITED}
Sample problem: using a world model (simulator) to answer questions
(e.g., what'd happen if 100 bombers went in there?)
Representation of this knowledge as PDMs at difft levels of abstn
Ability to generalize and cache results at one level at the
next higher level,
e.g. either as numerical tables, stat. distns, or
symbolic expressions
Ability to answer some questions appropriately for the requestor
at a high level of abstraction
KB Design
One good reason to use inheritanc is to speed knowledge
implementation, not computing performance
Using the system should result in its speedup
Storage should be cheap
Machine architecture
PDI should be cheap
PDMs should be scheduled with variable resources and
should be able to spend effort accordingly
How could propagation of changes be made efficient?
\NNSECP{Acknowledgements}
We did this all ourselves and have absolutely no intellectual debts
to anyone. All previous work is irrelevant. So there!
\vfill\end